296 research outputs found

    Choosing the Field of Study in Post-Secondary Education: Do Expected Earnings Matter?

    Get PDF
    This paper examines the determinants of the choice of the major when the length of studies is uncertain, by using a framework in which students entering post-secondary education are assumed to anticipate their future earnings. For that purpose, we use French data coming from the 1992 and 1998 Génération surveys collected by the Centre d'Etudes et de Recherches sur l'Emploi et les Qualifications (CEREQ, Marseille). Our econometric approach is based on a semi-structural three-equations model, which is identified thanks to some exclusion restrictions. We exploit in particular exogenous variations in the earnings returns associated with the majors across the business cycle, in order to identify the causal effect of expected earnings on the probability of choosing a given major. Relying on a three-component mixture distribution, we account for correlation between the unobserved individual-specific terms affecting the preferences for the majors, the unobserved individual-specific factors entering the equation determining the length of studies within each major, and that affecting the labor market earnings equation. Following Arcidiacono and Jones (2003), we use the EM algorithm with a sequential maximization step to produce consistent parameter estimates. Simulating for each given major a 10 percent increase in the expected earnings suggests that expected earnings have a statistically significant but quantitatively small impact on the allocation of students across majors.post-secondary education, major choice, returns to education, EM algorithm

    The Effect of Part-Time Work on Post-Secondary Educational Attainment: New Evidence from French Data

    Get PDF
    In this paper, we provide new evidence on the effect of part-time work on postsecondary educational attainment. To do so, we use samples extracted from the French Labor Force Surveys conducted over the years 1992-2002. These samples are restricted to students in initial education following university studies and preparing an Associate, a Bachelor or a Master degree. We estimate probit models with two simultaneous equations accounting for part-time working while studying and for success on the final exam, along with the decision to continue the following year in one of the models. We take the working time into account by drawing in one of the models a distinction between jobs in which more or less than 16 hours are worked per week. We use variations across departements in low-skilled youth unemployment rates and in their interactions with the father's socio-economic status in order to identify the effect of part-time work on educational attainment. Our results suggest a statistically significant and very large detrimental effect of holding a regular part-time job on graduation probability. Still, a complementary analysis shows that working while studying does not have any significant effect on the probability of continuing studies.post-secondary educational attainment, students' labor supply, bivariate Probit models

    Who are you, you who speak? Transducer cascades for information retrieval

    Get PDF
    International audienceThis paper deals with a survey corpus. We present information retrieval about the speaker. We used finite state transducer cascades and we present here detailed results with an evaluation. This work is part of a French project to enhance the corpus ESLO (sociolinguistic survey taken in the city of Orléans). This survey has been realized in 1968 and the project is to save records in computer format, to transcribe them and to increase the transcription with annotations in XML format. This work was supported by a French ANR contract (ANR-06-CORP-023) and by European fund from Région Centre (FEDER). The corpus represent a collection of 200 interviews with the questions about the life in the city of Orléans: How long have you lived in Orléans for?, What led you to live in Orléans?, Do you like living in Orléans?, etc. and questions about the occupation or the family of the speaker, completed by recordings within a professional or private context. The recording situations are different: interviews, discussions between friends, recordings in microphone hidden, interviews with the political, academic and religious personalities, conversations between a social worker and parents in Psycho Medical Center of Orleans. In total, we have 300 hours of speech estimated to 4,500,000 words. More precisely, we worked on almost 120 transcribed hours representing 112 Transcriber XML files and 32 577 Kb. We worked on 105 files (31 004 Kb) and we evaluated the results on 7 files (1 573 Kb-5.1%). The transcription files have no punctuation marks, but the first letter of proper names is capitalized and acronyms are fully capitalized. We used the CasSys system (Friburger, Maurel, 2004) that computes texts with transducer cascades (Abney, 1996). The cascades we used are hand built: each transducer describes a local grammar for the recognition of some entities. Some times this recognition needs the succession of two or more transducers, in a specific order. More precisely, we used two cascades; the first one, for named entity recognition, was built some years ago for a newspaper corpus and we adapted it to oral corpus in the project; the second one aimed at discovering information about the speaker in three domains: origin (is he/she Orléans city native or where he/she comes from?), family (is he/she married, with children or not?) and occupation (what is his/her occupation? where does he/she work?). We called this information designating entities. This second cascade was specifically built for the project. CasSys computes transducers with Unitex software (Paumier, 2003) that needs to segment the text by preprocessing. For written text, this segmentation usually uses sentence boundary detection (Friburger and al., 2000). In our corpus there is no punctuation. So we have chosen to use XML Transcriber tags to do the segmentation and also to hide the inside of the tag for the named entity task, sometimes ambiguous with context entities (Dister, 2007)

    La formation des gentilés sur Internet

    Get PDF
    L’article traite de la formation des gentilés (noms d’habitants) dont la forme paraît communément irrégulière, p. ex. Palois de Pau. Il montre que la formation des gentilés, construits par suffixation, est assez régulière en suivant principalement les processus réguliers comme la troncation, l’épenthèse et l’allomorphie. Après une brève délimitation du cadre des gentilés, la problématique de l’ignorance d’une part des gentilés est explicitée. Il en résulte qu’on observe la création de nouvelles formes gentiléennes sur Internet, où beaucoup de ces formations ont pu être recensées. Leur existence et leur forme constituent une preuve de la régularité de ce domaine.This paper deals with the construction of the names of inhabitants in French. It seems to be generally irregular (e.g. Palois from Pau). It is shown that their construction, mainly by suffixation, is quite regular and follows regular processes such as truncation, epenthesis and allomorphy. After a short delimitation of the framework, the problem of lack of awareness of many of these names is presented. This is the reason why new forms for designation of inhabitants can be observed on the Web where many of these forms have been gathered. Their existence and form proves the regularity of the field

    Enrichment of Renaissance texts with proper names

    Get PDF
    International audienceThe Renom project proposes to enrich Renaissance texts by proper names. These texts present two new challenges: great diversity due to various spellings of words; numerous XML-TEI tags to save the exact format of original edition. The task consisted to add Named Entity tags to this format tagging with generally the left context and sometimes the right context of a name. To do that, we improved the free and open source program CasSys to parse texts with Unitex graph cascades and we built dictionaries and specific cascades. The slot error rate was 6.1%. Proper Names and maps. were to allow navigating into. So, this paper deals with Named Entity Recognition in Renaissance texts

    The Impact of Students' Paid Employment on Pursuit and Completion of University Studies

    Get PDF
    This paper is devoted to the estimation of the effects of students paid employment on their success at university and their decision to pursue their studies. Our analysis is based on samples extracted from INSEE Labour Force Surveys conducted between 1992 and 2002. The samples are restricted to students who have begun their university studies and are preparing a first- or second-stage degree. We exclude students whose jobs are linked to their studies, particularly apprentices under contract and employment program trainees. The results show that occupying a regular job significantly reduces the probability of graduating at the end of the academic year. If they did not work, students in paid employment would have a 43-point-higher probability of completing their academic year successfully. An additional analysis shows that the job-plus-studies combination does not significantly influence the probability of pursuing ones studies in the following year.Post-Secondary Educational Attainment, Students' Labour Supply, Bivariate Probit Models

    A Semi-automatic and Low Cost Approach to Build Scalable Lemma-based Lexical Resources for Arabic Verbs

    Get PDF
    International audienceThis work presents a method that enables Arabic NLP community to build scalable lexical resources. The proposed method is low cost and efficient in time in addition to its scalability and extendibility. The latter is reflected in the ability for the method to be incremental in both aspects, processing resources and generating lexicons. Using a corpus; firstly, tokens are drawn from the corpus and lemmatized. Secondly, finite state transducers (FSTs) are generated semi-automatically. Finally, FSTsare used to produce all possible inflected verb forms with their full morphological features. Among the algorithm’s strength is its ability to generate transducers having 184 transitions, which is very cumbersome, if manually designed. The second strength is a new inflection scheme of Arabic verbs; this increases the efficiency of FST generation algorithm. The experimentation uses a representative corpus of Modern Standard Arabic. The number of semi-automatically generated transducers is 171. The resulting open lexical resources coverage is high. Our resources cover more than 70% Arabic verbs. The built resources contain 16,855 verb lemmas and 11,080,355 fully, partially and not vocalized verbal inflected forms. All these resources are being made public and currently used as an open package in the Unitex framework available under the LGPL license

    Explorer des corpus à l'aide de CasSys. Application au Corpus d'Orléans

    Get PDF
    International audienceCet article présente un outil d'exploration de corpus, CasSys, facilement paramétrisable par les linguistes, permettant de reconnaître des motifs même complexes et de les baliser, éventuellement par des balises XML. Ce balisage automatique peut ensuite être révisé par un expert. CasSys est donc un outil d'exploration de corpus, mais également d'annotation enrichie semi-supervisée.Deux exemples réels complèteront cette présentation : la recherche des entités nommées du Corpus d'Orléans et l'utilisation de ces entités pour connaître des informations sur les personnes répondant à l'enquête qui constitue ce corpus. Ce travail a bénéficié du financement du projet ANR Variling et d'un projet Feder Région Centre. Il a aussi été testé dans le cadre de l'évaluation Ester2 (campagne d'évaluation des systèmes de transcription enrichie d'émissions radiophoniques)

    Editorial

    Get PDF
    Le traitement lexicographique des noms propres pose problème pour plusieurs raisons : tout d’abord, alors qu’il existe pour les noms communs de nombreux dictionnaires sous forme électronique ou papier, le choix est nettement plus réduit en ce qui concerne les noms propres, a fortiori dans un environnement multilingue. Ensuite, bien souvent, les informations qui sont données à propos d’un nom propre sont de nature encyclopédique alors que bien souvent, c’est d’informations syntactico-sémantiqu..

    Une ontologie multilingue des noms propres

    Get PDF
    Cet article décrit une ontologie multilingue de noms propres divisée en deux parties, une partie supérieure partagée par toutes les langues traitées et une partie inférieure particulière à chacune d’elles. Elle comprend, d’une part, trois relations sémantiques (Synonymie, Méronymie et Prédication) et, d’autre part, des informations morphosyntaxiques.This paper describes a multilingual ontology of proper names divided into two parts, a first part shared by all the treated languages and a second part specific to each language. It includes, on the one hand, three semantic relations (Synonymy, Meronymy and Predication) and, on the other hand, some morphosyntactical information
    • …
    corecore